Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: flippertv Date: Mar 7, 2009 5:33pm
Forum: web Subject: robots.txt

How can allow Internet Archive sub domains while at the same time have it exclude from any search engine, like Google, Yahoo example, using the www.yourdomain.com/robots.txt

Reply to this post
Reply [edit]

Poster: kustota Date: Mar 8, 2009 9:10am
Forum: web Subject: Re: robots.txt

User-agent: *
Disallow: /

User-agent: ia_archiver
allow: /

Reply to this post
Reply [edit]

Poster: bmuramatsu Date: Aug 7, 2011 4:05pm
Forum: web Subject: Re: robots.txt

Hi, is IA actually following the Allow directive?

I've had my robots.txt file configured as below, but IA still doesn't seem to index my site.

Thanks in advance.

===

User-agent: *
Disallow: /

User-agent: ia_archiver
Allow: /

User-agent: archive.org_bot
Allow: /

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Nov 7, 2013 1:12pm
Forum: web Subject: Re: robots.txt

Hello bmuramatsu, did you manage to whitelist the wayback machine with the Allow directive, in the end? What's your website?

Edit: see also an example of robots.txt which is not respected as intended, Google's. https://archive.org/post/1004436

This post was modified by Nemo_bis on 2013-11-07 21:12:19

Reply to this post
Reply [edit]

Poster: jonc Date: Aug 7, 2011 6:39pm
Forum: web Subject: Re: robots.txt

You'll be better off starting a new thread instead replying to one two years old. The old threads get buried and won't be seen by anybody casually browsing the forums.

It takes a few months to crawl the entire Web, so it might be a while before your site is archived. You'll just have to be patient.